Skip to content

fix(async-op): return 404 when bank doesn't exist instead of raw FK 500#2352

Merged
nicoloboschi merged 1 commit into
mainfrom
fix/async-op-submit-bank-not-found
Jun 23, 2026
Merged

fix(async-op): return 404 when bank doesn't exist instead of raw FK 500#2352
nicoloboschi merged 1 commit into
mainfrom
fix/async-op-submit-bank-not-found

Conversation

@cdbartholomew

Copy link
Copy Markdown
Contributor

Summary

_submit_async_operation (the helper behind submit_async_consolidation,
submit_async_retain, submit_async_file_retain, and
submit_async_refresh_mental_model) always INSERTs into async_operations,
which has an FK to banks(bank_id). If a caller submits for a bank that
doesn't exist — race against a concurrent bank deletion, or an integration
that derives bank IDs and submits before the bank's CREATE has been
issued — the INSERT raises asyncpg.exceptions.ForeignKeyViolationError.

The FastAPI endpoints wrap submit calls in a generic except Exception and
return HTTP 500 with the FK violation in the detail. This is a client-side
error misclassified as a server fault. It pollutes 5xx error-rate metrics
with what should be 404s, and exposes asyncpg internals to API consumers.

Fix

Add a bank-existence precheck at the top of the INSERT path in both branches
of _submit_async_operation:

  • dedupe_by_bank=True already runs
    SELECT 1 FROM banks WHERE bank_id = $1 FOR NO KEY UPDATE for
    serialization (issue Consolidation: large backlogs don't drain to completion; concurrency decays to ~1 despite WORKER_CONSOLIDATION_MAX_SLOTS #1842). Switch it from conn.execute() to
    conn.fetchval() so the returned value gates existence — same SQL,
    same lock semantics, but the rowcount becomes observable. If
    bank_exists is None, raise OperationValidationError(404).

  • dedupe_by_bank=False (scoped consolidation submits, etc.) had no
    lock and no check; add a plain SELECT 1 FROM banks for existence only.
    Same OperationValidationError(404) on miss.

The FastAPI endpoints' existing handler already converts
OperationValidationError to HTTPException(status_code=e.status_code, detail=e.reason)no API-layer changes needed.

Test plan

hindsight-api-slim/tests/test_async_submit_bank_not_found.py:

  • test_consolidation_submit_on_missing_bank_raises_validation_error
    unscoped submit on a non-existent bank asserts OperationValidationError
    with status_code == 404 and bank ID in the reason.
  • test_scoped_consolidation_submit_on_missing_bank_raises_validation_error
    same but with observation_scopes set (takes the dedupe_by_bank=False
    branch).
  • test_graph_maintenance_on_missing_bank_short_circuits — pins the
    pre-existing behaviour that submit_async_graph_maintenance returns
    {"operation_id": None, "no_work": True} before reaching the INSERT
    (its own queue precheck already covers the FK case).

Regression sweep on adjacent suites — 59/59 passing:

  • test_async_submit_bank_not_found.py (new) — 3
  • test_consolidation_submit_atomic_dedup.py — verifies dedup lock semantics
    (lock behaviour was the only thing I worried about touching). All 4 pass.
  • test_consolidation_retry_dedup_by_bank.py
  • test_consolidation_silent_requeue_failure.py
  • test_consolidation_scope_parallelism.py
  • test_op_cancellation.py
  • test_async_batch_retain.py
  • test_async_retain_tags.py

ruff check + ruff format clean. Pre-commit hooks pass.

Compatibility

  • Postgres: no behavior change for normal flow (bank exists). The
    FOR NO KEY UPDATE lock is unchanged.
  • Oracle: same — the FOR NO KEY UPDATE -> FOR UPDATE rewrite is
    identical, and the existence check is a plain SELECT 1.
  • API contract: existing 500 path on a missing-bank submit becomes a 404.
    Callers that were relying on the 500 (unlikely — it surfaced asyncpg
    internals) should already be retrying or surfacing the error.

`_submit_async_operation` always INSERTs into `async_operations`, which has
an FK to `banks(bank_id)`. Callers that race against bank deletion, or that
derive bank IDs before creating the bank (an integration that submits
`/consolidate` on a freshly-named bank before its CREATE has been issued),
hit `asyncpg.exceptions.ForeignKeyViolationError` out of the INSERT. The
FastAPI endpoints' generic `except Exception` then surfaces it as an opaque
500 — but the root cause is a client misuse, not a server fault.

Add a bank-existence precheck at the top of the INSERT path in both branches:

- `dedupe_by_bank=True` already runs `SELECT 1 FROM banks WHERE bank_id = $1
  FOR NO KEY UPDATE` (for serialization, issue #1842). Switch it from
  `execute` to `fetchval` so the rowcount also gates existence — preserves
  the lock semantics, just adds a check on the returned value.
- `dedupe_by_bank=False` (scoped submits) previously had no lock and no
  check; add a plain `SELECT 1 FROM banks` for existence only.

When the bank is missing, raise `OperationValidationError(404)`. The
endpoint's existing `except OperationValidationError` clause already
converts that to `HTTPException(status_code=e.status_code, detail=e.reason)`
— no API-layer changes needed.

Tests:
- 2 regression tests for `submit_async_consolidation` (unscoped + scoped)
  against a missing bank — assert OperationValidationError with status_code=404.
- 1 pin test for `submit_async_graph_maintenance`, which has its own
  pre-INSERT short-circuit (empty queue → no_work=True) that already
  avoided the FK error.

Verified that the existing dedup atomicity tests
(`test_consolidation_submit_atomic_dedup.py`,
`test_consolidation_retry_dedup_by_bank.py`) still pass — the lock
semantics on the dedupe branch are unchanged.
@nicoloboschi nicoloboschi merged commit 04703d2 into main Jun 23, 2026
193 of 196 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants